How Programs Execute: CPU, RAM & Memory Management
Overviewโ
When you run a program, an intricate dance occurs between the CPU, RAM, and storage. This document explains exactly how your computer transforms code into actions.
Computer Architecture Fundamentalsโ
CPU Architectureโ
Inside the CPUโ
Key CPU Componentsโ
| Component | Purpose | Speed |
|---|---|---|
| Registers | Store immediate data for processing | 1 cycle (~0.3 ns) |
| L1 Cache | First-level cache, fastest memory | 3-4 cycles (~1 ns) |
| L2 Cache | Second-level cache | 10-20 cycles (~3 ns) |
| L3 Cache | Third-level cache, shared | 30-70 cycles (~10 ns) |
| RAM | Main system memory | 100-300 cycles (~100 ns) |
| SSD | Solid state storage | ~50,000 ns |
| HDD | Hard disk drive | ~5,000,000 ns |
Program Execution Flowโ
From Storage to Executionโ
Memory Hierarchyโ
CPU Instruction Cycle (Fetch-Decode-Execute)โ
Example: Adding Two Numbersโ
Instruction: ADD R1, R2, R3 (R1 = R2 + R3)
1. FETCH:
- PC = 0x1000 (program counter points to instruction)
- Load instruction from memory address 0x1000
- PC = 0x1004 (move to next instruction)
2. DECODE:
- Opcode: ADD
- Operand 1: R2 (register 2)
- Operand 2: R3 (register 3)
- Destination: R1 (register 1)
3. EXECUTE:
- Read value from R2 (e.g., 10)
- Read value from R3 (e.g., 20)
- ALU performs: 10 + 20 = 30
4. STORE:
- Write result (30) to R1
- Update flags (zero flag, carry flag, etc.)
RAM Organization for a Programโ
Memory Layout of a Processโ
Memory Segments Explainedโ
1. Text/Code Segmentโ
- Contains compiled machine code (instructions)
- Read-only and executable
- Shared among multiple instances of same program
- Fixed size at load time
2. Data Segmentโ
- Initialized Data: Global and static variables with initial values
int globalVar = 100; // Stored in data segment
static int count = 0; // Stored in data segment
3. BSS Segment (Block Started by Symbol)โ
- Uninitialized global and static variables
- Automatically initialized to zero
- Doesn't occupy space in executable file
int globalArray[1000]; // Stored in BSS
static int flag; // Stored in BSS
4. Heapโ
- Dynamic memory allocation
- Grows upward toward higher addresses
- Managed by programmer (malloc/free, new/delete)
- Exists until program ends or explicitly freed
int* ptr = malloc(sizeof(int) * 100); // Allocated on heap
5. Stackโ
- Automatic memory allocation
- Grows downward toward lower addresses
- Stores local variables, function parameters, return addresses
- Automatically cleaned up when function returns
void function() {
int localVar = 10; // Stored on stack
}
Variable Storage in Memoryโ
Example Program Analysisโ
#include <stdio.h>
#include <stdlib.h>
int globalVar = 100; // Data segment
static int staticVar = 200; // Data segment
int uninitGlobal; // BSS segment
void function(int param) { // param on stack
int localVar = 10; // Stack
static int staticLocal = 5; // Data segment
int* heapVar = malloc(sizeof(int)); // Pointer on stack, data on heap
*heapVar = 20; // Value stored on heap
printf("Address of param: %p\n", ¶m);
printf("Address of localVar: %p\n", &localVar);
printf("Address of heapVar: %p\n", heapVar);
free(heapVar);
}
int main() {
int mainLocal = 5; // Stack
function(mainLocal);
return 0;
}
How CPU Executes Instructionsโ
Assembly to Machine Codeโ
CPU Registers During Executionโ
Complete Program Execution Exampleโ
Simple C Programโ
int main() {
int a = 5;
int b = 10;
int c = a + b;
return c;
}
Step-by-Step Executionโ
Memory Access Patternโ
Function Call Stackโ
How Function Calls Workโ
Function Call Exampleโ
void function2(int x) {
int local2 = x * 2;
return;
}
void function1(int y) {
int local1 = y + 1;
function2(local1);
return;
}
int main() {
int a = 5;
function1(a);
return 0;
}
Dynamic Memory Allocationโ
Heap Managementโ
malloc/free Processโ
CPU Pipelineโ
Modern CPUs Execute Multiple Instructions Simultaneouslyโ
Pipeline Stagesโ
- Fetch (IF): Get instruction from memory
- Decode (ID): Interpret instruction and read registers
- Execute (EX): Perform operation in ALU
- Memory (MEM): Access memory if needed
- Writeback (WB): Write result to register
Cache Memoryโ
How Cache Worksโ
Cache Line Exampleโ
Virtual Memoryโ
Virtual to Physical Address Translationโ
Page Table Structureโ
Complete System Viewโ
Performance Comparisonโ
Access Time Comparisonโ
Human-Scale Time Analogyโ
If accessing a CPU register took 1 second, here's how long other operations would take:
| Memory Level | Actual Time | If Register = 1 Second |
|---|---|---|
| CPU Register | 0.3 ns | 1 second |
| L1 Cache | 1 ns | 3 seconds |
| L2 Cache | 3 ns | 10 seconds |
| L3 Cache | 10 ns | 33 seconds |
| RAM | 100 ns | 5.5 minutes |
| SSD | 50 ฮผs | 1.9 days |
| HDD | 5 ms | 6.4 months |
Key Takeawaysโ
- Speed vs Size Trade-off: Faster memory is exponentially more expensive and smaller
- Locality Matters: Programs that access nearby memory locations run faster due to caching
- Cache is Critical: Modern CPUs spend significant silicon area on cache to bridge the speed gap
- RAM is Slow: Despite being "fast" by human standards, RAM is ~100x slower than L1 cache
- Disk is Extremely Slow: SSDs are 500,000x slower than registers; HDDs are 16 million times slower